torchvision.models

您所在的位置:网站首页 pytorch shufflenet torchvision.models

torchvision.models

#torchvision.models | 来源: 网络整理| 查看: 265

torchvision.models¶

The models subpackage contains definitions of models for addressing different tasks, including: image classification, pixelwise semantic segmentation, object detection, instance segmentation, person keypoint detection and video classification.

Note

Backward compatibility is guaranteed for loading a serialized state_dict to the model created using old PyTorch version. On the contrary, loading entire saved models or serialized ScriptModules (seralized using older versions of PyTorch) may not preserve the historic behaviour. Refer to the following documentation

Classification¶

The models subpackage contains definitions for the following model architectures for image classification:

AlexNet

VGG

ResNet

SqueezeNet

DenseNet

Inception v3

GoogLeNet

ShuffleNet v2

MobileNetV2

MobileNetV3

ResNeXt

Wide ResNet

MNASNet

EfficientNet

RegNet

You can construct a model with random weights by calling its constructor:

import torchvision.models as models resnet18 = models.resnet18() alexnet = models.alexnet() vgg16 = models.vgg16() squeezenet = models.squeezenet1_0() densenet = models.densenet161() inception = models.inception_v3() googlenet = models.googlenet() shufflenet = models.shufflenet_v2_x1_0() mobilenet_v2 = models.mobilenet_v2() mobilenet_v3_large = models.mobilenet_v3_large() mobilenet_v3_small = models.mobilenet_v3_small() resnext50_32x4d = models.resnext50_32x4d() wide_resnet50_2 = models.wide_resnet50_2() mnasnet = models.mnasnet1_0() efficientnet_b0 = models.efficientnet_b0() efficientnet_b1 = models.efficientnet_b1() efficientnet_b2 = models.efficientnet_b2() efficientnet_b3 = models.efficientnet_b3() efficientnet_b4 = models.efficientnet_b4() efficientnet_b5 = models.efficientnet_b5() efficientnet_b6 = models.efficientnet_b6() efficientnet_b7 = models.efficientnet_b7() regnet_y_400mf = models.regnet_y_400mf() regnet_y_800mf = models.regnet_y_800mf() regnet_y_1_6gf = models.regnet_y_1_6gf() regnet_y_3_2gf = models.regnet_y_3_2gf() regnet_y_8gf = models.regnet_y_8gf() regnet_y_16gf = models.regnet_y_16gf() regnet_y_32gf = models.regnet_y_32gf() regnet_x_400mf = models.regnet_x_400mf() regnet_x_800mf = models.regnet_x_800mf() regnet_x_1_6gf = models.regnet_x_1_6gf() regnet_x_3_2gf = models.regnet_x_3_2gf() regnet_x_8gf = models.regnet_x_8gf() regnet_x_16gf = models.regnet_x_16gf() regnet_x_32gf = models.regnet_x_32gf()

We provide pre-trained models, using the PyTorch torch.utils.model_zoo. These can be constructed by passing pretrained=True:

import torchvision.models as models resnet18 = models.resnet18(pretrained=True) alexnet = models.alexnet(pretrained=True) squeezenet = models.squeezenet1_0(pretrained=True) vgg16 = models.vgg16(pretrained=True) densenet = models.densenet161(pretrained=True) inception = models.inception_v3(pretrained=True) googlenet = models.googlenet(pretrained=True) shufflenet = models.shufflenet_v2_x1_0(pretrained=True) mobilenet_v2 = models.mobilenet_v2(pretrained=True) mobilenet_v3_large = models.mobilenet_v3_large(pretrained=True) mobilenet_v3_small = models.mobilenet_v3_small(pretrained=True) resnext50_32x4d = models.resnext50_32x4d(pretrained=True) wide_resnet50_2 = models.wide_resnet50_2(pretrained=True) mnasnet = models.mnasnet1_0(pretrained=True) efficientnet_b0 = models.efficientnet_b0(pretrained=True) efficientnet_b1 = models.efficientnet_b1(pretrained=True) efficientnet_b2 = models.efficientnet_b2(pretrained=True) efficientnet_b3 = models.efficientnet_b3(pretrained=True) efficientnet_b4 = models.efficientnet_b4(pretrained=True) efficientnet_b5 = models.efficientnet_b5(pretrained=True) efficientnet_b6 = models.efficientnet_b6(pretrained=True) efficientnet_b7 = models.efficientnet_b7(pretrained=True) regnet_y_400mf = models.regnet_y_400mf(pretrained=True) regnet_y_800mf = models.regnet_y_800mf(pretrained=True) regnet_y_1_6gf = models.regnet_y_1_6gf(pretrained=True) regnet_y_3_2gf = models.regnet_y_3_2gf(pretrained=True) regnet_y_8gf = models.regnet_y_8gf(pretrained=True) regnet_y_16gf = models.regnet_y_16gf(pretrained=True) regnet_y_32gf = models.regnet_y_32gf(pretrained=True) regnet_x_400mf = models.regnet_x_400mf(pretrained=True) regnet_x_800mf = models.regnet_x_800mf(pretrained=True) regnet_x_1_6gf = models.regnet_x_1_6gf(pretrained=True) regnet_x_3_2gf = models.regnet_x_3_2gf(pretrained=True) regnet_x_8gf = models.regnet_x_8gf(pretrained=True) regnet_x_16gf = models.regnet_x_16gf(pretrainedTrue) regnet_x_32gf = models.regnet_x_32gf(pretrained=True)

Instancing a pre-trained model will download its weights to a cache directory. This directory can be set using the TORCH_MODEL_ZOO environment variable. See torch.utils.model_zoo.load_url() for details.

Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, use model.train() or model.eval() as appropriate. See train() or eval() for details.

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. You can use the following transform to normalize:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

An example of such normalization can be found in the imagenet example here

The process for obtaining the values of mean and std is roughly equivalent to:

import torch from torchvision import datasets, transforms as T transform = T.Compose([T.Resize(256), T.CenterCrop(224), T.ToTensor()]) dataset = datasets.ImageNet(".", split="train", transform=transform) means = [] stds = [] for img in subset(dataset): means.append(torch.mean(img)) stds.append(torch.std(img)) mean = torch.mean(torch.tensor(means)) std = torch.mean(torch.tensor(stds))

Unfortunately, the concrete subset that was used is lost. For more information see this discussion or these experiments.

The sizes of the EfficientNet models depend on the variant. For the exact input sizes check here

ImageNet 1-crop error rates

Model

Acc@1

Acc@5

AlexNet

56.522

79.066

VGG-11

69.020

88.628

VGG-13

69.928

89.246

VGG-16

71.592

90.382

VGG-19

72.376

90.876

VGG-11 with batch normalization

70.370

89.810

VGG-13 with batch normalization

71.586

90.374

VGG-16 with batch normalization

73.360

91.516

VGG-19 with batch normalization

74.218

91.842

ResNet-18

69.758

89.078

ResNet-34

73.314

91.420

ResNet-50

76.130

92.862

ResNet-101

77.374

93.546

ResNet-152

78.312

94.046

SqueezeNet 1.0

58.092

80.420

SqueezeNet 1.1

58.178

80.624

Densenet-121

74.434

91.972

Densenet-169

75.600

92.806

Densenet-201

76.896

93.370

Densenet-161

77.138

93.560

Inception v3

77.294

93.450

GoogleNet

69.778

89.530

ShuffleNet V2 x1.0

69.362

88.316

ShuffleNet V2 x0.5

60.552

81.746

MobileNet V2

71.878

90.286

MobileNet V3 Large

74.042

91.340

MobileNet V3 Small

67.668

87.402

ResNeXt-50-32x4d

77.618

93.698

ResNeXt-101-32x8d

79.312

94.526

Wide ResNet-50-2

78.468

94.086

Wide ResNet-101-2

78.848

94.284

MNASNet 1.0

73.456

91.510

MNASNet 0.5

67.734

87.490

EfficientNet-B0

77.692

93.532

EfficientNet-B1

78.642

94.186

EfficientNet-B2

80.608

95.310

EfficientNet-B3

82.008

96.054

EfficientNet-B4

83.384

96.594

EfficientNet-B5

83.444

96.628

EfficientNet-B6

84.008

96.916

EfficientNet-B7

84.122

96.908

regnet_x_400mf

72.834

90.950

regnet_x_800mf

75.212

92.348

regnet_x_1_6gf

77.040

93.440

regnet_x_3_2gf

78.364

93.992

regnet_x_8gf

79.344

94.686

regnet_x_16gf

80.058

94.944

regnet_x_32gf

80.622

95.248

regnet_y_400mf

74.046

91.716

regnet_y_800mf

76.420

93.136

regnet_y_1_6gf

77.950

93.966

regnet_y_3_2gf

78.948

94.576

regnet_y_8gf

80.032

95.048

regnet_y_16gf

80.424

95.240

regnet_y_32gf

80.878

95.340

Alexnet¶ torchvision.models.alexnet(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.alexnet.AlexNet[source]¶

AlexNet model architecture from the “One weird trick…” paper. The required minimum input size of the model is 63x63.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

progress (bool) – If True, displays a progress bar of the download to stderr

VGG¶ torchvision.models.vgg11(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶

VGG 11-layer model (configuration “A”) from “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.vgg11_bn(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶

VGG 11-layer model (configuration “A”) with batch normalization “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.vgg13(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶

VGG 13-layer model (configuration “B”) “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.vgg13_bn(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶

VGG 13-layer model (configuration “B”) with batch normalization “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.vgg16(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶

VGG 16-layer model (configuration “D”) “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.vgg16_bn(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶

VGG 16-layer model (configuration “D”) with batch normalization “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.vgg19(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶

VGG 19-layer model (configuration “E”) “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.vgg19_bn(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.vgg.VGG[source]¶

VGG 19-layer model (configuration ‘E’) with batch normalization “Very Deep Convolutional Networks For Large-Scale Image Recognition”. The required minimum input size of the model is 32x32.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

progress (bool) – If True, displays a progress bar of the download to stderr

ResNet¶ torchvision.models.resnet18(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.resnet.ResNet[source]¶

ResNet-18 model from “Deep Residual Learning for Image Recognition”.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

progress (bool) – If True, displays a progress bar of the download to stderr

Examples using resnet18:

Tensor transforms and JIT¶

torchvision.models.resnet34(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.resnet.ResNet[source]¶

ResNet-34 model from “Deep Residual Learning for Image Recognition”.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.resnet50(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.resnet.ResNet[source]¶

ResNet-50 model from “Deep Residual Learning for Image Recognition”.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.resnet101(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.resnet.ResNet[source]¶

ResNet-101 model from “Deep Residual Learning for Image Recognition”.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

progress (bool) – If True, displays a progress bar of the download to stderr

torchvision.models.resnet152(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.resnet.ResNet[source]¶

ResNet-152 model from “Deep Residual Learning for Image Recognition”.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

progress (bool) – If True, displays a progress bar of the download to stderr

SqueezeNet¶ torchvision.models.squeezenet1_0(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.squeezenet.SqueezeNet[source]¶

SqueezeNet model architecture from the “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and >> images = list(image for image in images) >>> targets = [] >>> for i in range(len(images)): >>> d = {} >>> d['boxes'] = boxes[i] >>> d['labels'] = labels[i] >>> targets.append(d) >>> output = model(images, targets) >>> # For inference >>> model.eval() >>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] >>> predictions = model(x) >>> >>> # optionally, if you want to export the model to ONNX: >>> torch.onnx.export(model, x, "faster_rcnn.onnx", opset_version = 11) Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017

progress (bool) – If True, displays a progress bar of the download to stderr

num_classes (int) – number of output classes of the model (including the background)

pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet

trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable.

Examples using fasterrcnn_resnet50_fpn:

Repurposing masks into bounding boxes¶

Visualization utilities¶

torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=False, progress=True, num_classes=91, pretrained_backbone=True, trainable_backbone_layers=None, **kwargs)[source]¶

Constructs a high resolution Faster R-CNN model with a MobileNetV3-Large FPN backbone. It works similarly to Faster R-CNN with ResNet-50 FPN backbone. See fasterrcnn_resnet50_fpn() for more details.

Example:

>>> model = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=True) >>> model.eval() >>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] >>> predictions = model(x) Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017

progress (bool) – If True, displays a progress bar of the download to stderr

num_classes (int) – number of output classes of the model (including the background)

pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet

trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 6, with 6 meaning all backbone layers are trainable.

torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(pretrained=False, progress=True, num_classes=91, pretrained_backbone=True, trainable_backbone_layers=None, **kwargs)[source]¶

Constructs a low resolution Faster R-CNN model with a MobileNetV3-Large FPN backbone tunned for mobile use-cases. It works similarly to Faster R-CNN with ResNet-50 FPN backbone. See fasterrcnn_resnet50_fpn() for more details.

Example:

>>> model = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(pretrained=True) >>> model.eval() >>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] >>> predictions = model(x) Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017

progress (bool) – If True, displays a progress bar of the download to stderr

num_classes (int) – number of output classes of the model (including the background)

pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet

trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 6, with 6 meaning all backbone layers are trainable.

RetinaNet¶ torchvision.models.detection.retinanet_resnet50_fpn(pretrained=False, progress=True, num_classes=91, pretrained_backbone=True, trainable_backbone_layers=None, **kwargs)[source]¶

Constructs a RetinaNet model with a ResNet-50-FPN backbone.

Reference: “Focal Loss for Dense Object Detection”.

The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each image, and should be in 0-1 range. Different images can have different sizes.

The behavior of the model changes depending if it is in training or evaluation mode.

During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing:

boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with 0 predictions = model(x) Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017

progress (bool) – If True, displays a progress bar of the download to stderr

num_classes (int) – number of output classes of the model (including the background)

pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet

trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable.

Examples using retinanet_resnet50_fpn:

Visualization utilities¶

SSD¶ torchvision.models.detection.ssd300_vgg16(pretrained: bool = False, progress: bool = True, num_classes: int = 91, pretrained_backbone: bool = True, trainable_backbone_layers: Optional[int] = None, **kwargs: Any)[source]¶

Constructs an SSD model with input size 300x300 and a VGG16 backbone.

Reference: “SSD: Single Shot MultiBox Detector”.

The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each image, and should be in 0-1 range. Different images can have different sizes but they will be resized to a fixed size before passing it to the backbone.

The behavior of the model changes depending if it is in training or evaluation mode.

During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing:

boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with 0 predictions = model(x) Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017

progress (bool) – If True, displays a progress bar of the download to stderr

num_classes (int) – number of output classes of the model (including the background)

pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet

trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable.

Examples using ssd300_vgg16:

Visualization utilities¶

SSDlite¶ torchvision.models.detection.ssdlite320_mobilenet_v3_large(pretrained: bool = False, progress: bool = True, num_classes: int = 91, pretrained_backbone: bool = False, trainable_backbone_layers: Optional[int] = None, norm_layer: Optional[Callable[[…], torch.nn.modules.module.Module]] = None, **kwargs: Any)[source]¶

Constructs an SSDlite model with input size 320x320 and a MobileNetV3 Large backbone, as described at “Searching for MobileNetV3” and “MobileNetV2: Inverted Residuals and Linear Bottlenecks”.

See ssd300_vgg16() for more details.

Example

>>> model = torchvision.models.detection.ssdlite320_mobilenet_v3_large(pretrained=True) >>> model.eval() >>> x = [torch.rand(3, 320, 320), torch.rand(3, 500, 400)] >>> predictions = model(x) Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017

progress (bool) – If True, displays a progress bar of the download to stderr

num_classes (int) – number of output classes of the model (including the background)

pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet

trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 6, with 6 meaning all backbone layers are trainable.

norm_layer (callable, optional) – Module specifying the normalization layer to use.

Examples using ssdlite320_mobilenet_v3_large:

Visualization utilities¶

Mask R-CNN¶ torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=91, pretrained_backbone=True, trainable_backbone_layers=None, **kwargs)[source]¶

Constructs a Mask R-CNN model with a ResNet-50-FPN backbone.

Reference: “Mask R-CNN”.

The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each image, and should be in 0-1 range. Different images can have different sizes.

The behavior of the model changes depending if it is in training or evaluation mode.

During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing:

boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with 0 > predictions = model(x) >>> >>> # optionally, if you want to export the model to ONNX: >>> torch.onnx.export(model, x, "mask_rcnn.onnx", opset_version = 11) Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017

progress (bool) – If True, displays a progress bar of the download to stderr

num_classes (int) – number of output classes of the model (including the background)

pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet

trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable.

Examples using maskrcnn_resnet50_fpn:

Visualization utilities¶

Keypoint R-CNN¶ torchvision.models.detection.keypointrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=2, num_keypoints=17, pretrained_backbone=True, trainable_backbone_layers=None, **kwargs)[source]¶

Constructs a Keypoint R-CNN model with a ResNet-50-FPN backbone.

Reference: “Mask R-CNN”.

The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each image, and should be in 0-1 range. Different images can have different sizes.

The behavior of the model changes depending if it is in training or evaluation mode.

During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing:

boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with 0 predictions = model(x) >>> >>> # optionally, if you want to export the model to ONNX: >>> torch.onnx.export(model, x, "keypoint_rcnn.onnx", opset_version = 11) Parameters

pretrained (bool) – If True, returns a model pre-trained on COCO train2017

progress (bool) – If True, displays a progress bar of the download to stderr

num_classes (int) – number of output classes of the model (including the background)

num_keypoints (int) – number of keypoints, default 17

pretrained_backbone (bool) – If True, returns a model with backbone pre-trained on Imagenet

trainable_backbone_layers (int) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable.

Examples using keypointrcnn_resnet50_fpn:

Visualization utilities¶

Video classification¶

We provide models for action recognition pre-trained on Kinetics-400. They have all been trained with the scripts provided in references/video_classification.

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB videos of shape (3 x T x H x W), where H and W are expected to be 112, and T is a number of video frames in a clip. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.43216, 0.394666, 0.37645] and std = [0.22803, 0.22145, 0.216989].

Note

The normalization parameters are different from the image classification ones, and correspond to the mean and std from Kinetics-400.

Note

For now, normalization code can be found in references/video_classification/transforms.py, see the Normalize function there. Note that it differs from standard normalization for images because it assumes the video is 4d.

Kinetics 1-crop accuracies for clip length 16 (16x112x112)

Network

Clip acc@1

Clip acc@5

ResNet 3D 18

52.75

75.45

ResNet MC 18

53.90

76.29

ResNet (2+1)D

57.50

78.81

ResNet 3D¶ torchvision.models.video.r3d_18(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.video.resnet.VideoResNet[source]¶

Construct 18 layer Resnet3D model as in https://arxiv.org/abs/1711.11248

Parameters

pretrained (bool) – If True, returns a model pre-trained on Kinetics-400

progress (bool) – If True, displays a progress bar of the download to stderr

Returns

R3D-18 network

Return type

nn.Module

ResNet Mixed Convolution¶ torchvision.models.video.mc3_18(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.video.resnet.VideoResNet[source]¶

Constructor for 18 layer Mixed Convolution network as in https://arxiv.org/abs/1711.11248

Parameters

pretrained (bool) – If True, returns a model pre-trained on Kinetics-400

progress (bool) – If True, displays a progress bar of the download to stderr

Returns

MC3 Network definition

Return type

nn.Module

ResNet (2+1)D¶ torchvision.models.video.r2plus1d_18(pretrained: bool = False, progress: bool = True, **kwargs: Any) → torchvision.models.video.resnet.VideoResNet[source]¶

Constructor for the 18 layer deep R(2+1)D network as in https://arxiv.org/abs/1711.11248

Parameters

pretrained (bool) – If True, returns a model pre-trained on Kinetics-400

progress (bool) – If True, displays a progress bar of the download to stderr

Returns

R(2+1)D-18 network

Return type

nn.Module



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3